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Abstract 

We develop a lensless compressive imaging architecture, which consists of an aperture assembly and 
a single sensor, without using any lens. An anytime algorithm is proposed to reconstruct images from the 
compressive measurements; the algorithm produces a sequence of solutions that monotonically converge 
to the true signal (thus, anytime). The algorithm is developed based on the sparsity of local overlapping 
patches (in the transformation domain) and state-of-the-art results have been obtained. Experiments on 
real data demonstrate that encouraging results are obtained by measuring about 10% (of the image 
pixels) compressive measurements. The reconstruction results of the proposed algorithm are compared 
with the JPEG compression (based on file sizes) and the reconstructed image quality is close to the 
JPEG compression, in particular at a high compression rate. 

Index Terms 

Compressive sensing. Lensless compressive imaging, denoising, sparse representation, anytime. 


I. Introduction 


Compressive sensing [ 11, |2| is an emerging technique to acquire and process digital data 
such as two-dimensional images [j3j-[[5j|, hyperspectal images polarization images 0 


and videos [I0|-[21j. Compressive sensing is most effective when it is used in data acquisition: 
to capture the data in the form of compressive measurements |22j. Though the theory has been 
established over a decade ago 0, 0, [0, practical applications are still attracting researchers 
and engineers. With compressive measurements, images may be reconstructed with far fewer 
measurements than the number of pixels in the original images. Therefore, by using compressive 
sensing in acquisition, images are compressed while they are captured, avoiding high speed 
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processing, or transmission, of a large number of pixels. The single pixel camera [24], (251 
directly captures compressive measurements of an image, which is a camera architecture that 
employs a digital micromirror array to perform optical implementation of linear projections of 
an image onto pseudo-random binary patterns. It has the ability to obtain an image with a 
single detection element while sampling the image fewer times than the number of pixels. The 


same camera architecture is also used for Terahertz imaging |26], [271, and millimeter wave 
imaging [28 [ and X-ray imaging [29]. These cameras all make use of a lens to form an image 


in a plane before the image is projected onto a pseudo-random binary pattern. Lenses, however, 


severely constrain the geometric and radiometric mapping from the scene to the image [30j|. 
Furthermore, lenses add size, cost and complexity to a camera, especially at the non-visible 
bandwidth. 

In this paper, we present an in-depth description and mathematical analysis of a lensless 


compressive imaging architecture that was originally proposed in [3TJ. We have built a new 
version of lensless camera with new hardware under the same architecture, which consists of 
two components, an aperture assembly and a single sensor. No lens is used. The aperture assembly 
consists of a two dimensional array of aperture elements. The transmittance of each aperture 
element is independently controllable. The sensor is a single detection element, such as a single 
photo-conductive cell. Each aperture element together with the sensor defines a cone of a bundle 
of rays, and the cones of the aperture assembly define the pixels of an image. The sensor is used 
for taking compressive measurements. Each measurement is the integration of rays in the cones 
modulated by the transmittance of the aperture elements. The proposed architecture is different 


from the cameras of [24] and [30[. The fundamental difference is how the image is formed. In 


both [24] and [30], an image of the scene is formed on a plane, by some physical mechanism such 


a lens or a pinhole, before it is digitally captured (by compressive measurements in [24[, and by 


pixels in [30[). In the proposed architecture of this work, no image is physically formed before 
the image is captured. The proposed architecture is also related to, the traditional coded aperture 


imaging [32], [33]. Specifically, if the sensor number is on the order of the aperture element 
figures, the proposed architecture will be the coded aperture. However, only a single detector is 
used in our system. The proposed architecture is distinctive with the following features. 

. No lens is used. An imaging device using the proposed architecture can be built with reduced 


August 17, 2015 


DRAFT 
















3 


size, weight, cost and complexity. In fact, our architecture does not rely on any physical 
mechanism to form an image before it is digitally captured. 

No scene is out of focus. The sharpness and resolution of images from the proposed 
architecture are basically limited by the resolution of the aperture assembly (number of 
aperture elements), there is no blurring introduced by lens for scenes that are out of focusQ 
The same architecture can be used for imaging of visible spectrum, and other spectra such 
as infrared and millimeter waves. 

When multiple sensors are used in our system, they can be placed in arbitrary position 
and the scene is still in focus for each sensor. Therefore, it readily forms a multi-view 


system [341. 

• The proposed architecture can be used to capture hyperspectral images [7] and polarized 
images 0 by integrating related hardware. 

We built a prototype device for capturing images of visible spectrum. It consists of an LCD 


panel, and a photo-electric detector. Though the camera has been introduced in [311, [34], no 
algorithm has been developed particularly for this new camera and in this paper we present more 
details on both hardware and algorithm development. 

The proposed algorithm is an anytime algorithm [j35j. As defined in [36], “anytime algorithms 


are algorithms whose quality of results improves gradually as computation time increases”. 
Specifically, the solution in each iteration of our algorithm monotonically converges to the ground 
truth. In addition, we compare our new algorithm with JPEG compression at (roughly) the same 
compression ratio (based on the size of JPEG files). More importantly, we demonstrate that the 
proposed algorithm achieves similar quality of JPEG at high compression ratio. 

The rest of this paper is organized as follows: The architecture is described in Section |TT] 
and Section [TTI] derives the mathematical formulation. Our new algorithm is proposed in Sec- 
tion[IV]and Section [V] provides extensive simulation results. Section [Vljpresents the experimental 


hardware and real data results. Section VII summarizes the entire paper. 


'When the scene is far from the camera, each pixel covers a large area and the image will look blurry. However, this is not 
introduced by the lens as in a conventional camera. 
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Fig. 1: The proposed architecture. It consists of two components: an aperture assembly and an 
infinitesimal sensor of a single detection element. 


II. Architecture 

Figure |T| depicts the proposed architecture, which consists of two principle components: an 
aperture assembly and a single sensor (a photodiode), which measures the light intensity to 
capture grayscale images, or a tri-color sensor which measures intensity of each RGB channel 
to capture color images. The aperture assembly is made up of a two dimensional array of 
aperture elements (blocks on the aperture assembly in Figure [T]); the transmittance of each 
element {Aj\K^-=7 a can be individually controlled with (N x ,N y ) denoting the dimension of the 
aperture assembly. 

Each element of the aperture assembly, together with the sensor, defines a cone of a bundle 
of rays (Figure |Tj), and the integration of the rays within a cone is defined as a pixel value 
of the image. Therefore, in the proposed architecture, an image is defined by the pixels which 
correspond to the array of aperture elements in the aperture assembly. One possible way to 
measure an image with a single sensor is to capture the image pixel by pixel, which can be 
implemented by reading of the sensor when one of the aperture elements is completely open 
(.Pij = 1) and all others are completely closed — 0). The measurements are the 

pixel values of the image when the elements of the aperture assembly are opened one by one 
in certain scan order. This corresponds to the traditional representation of a digital image pixel 
by pixel. 
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With compressive sensing, it is possible to represent an image by using fewer measurements 
than the number of pixels [241, [251. The proposed architecture in Figure |T| aims to simplify the 
procedure of capturing compressive measurements. Recall in compressive sensing |2j 


y = Ax + n, 


( 1 ) 


where A e M MxAr is the sensing matrix, x is the desired signal (denoting vectorized image in 
this work and N = N x N y ), y e M m is the measurement and usually M A; n symbolizes the 
measurement noise. Each row of A defines a pattern for the elements of the aperture assembly, 
and the number of columns in a sensing matrix is equal to the number of total elements in the 
aperture assembly. Each value in a row of the sensing matrix is used to define the transmittance of 
an element of the aperture assembly. A row of the sensing matrix therefore completely defines a 
pattern for the aperture assembly, and it allows the sensor to make one measurement (one element 
in y) for the given pattern of the aperture assembly. The number of rows of the sensing matrix 
is the number of measurements, which is usually much smaller than the number of aperture 
elements in the aperture assembly (the number of pixels). Let A be a matrix whose entries are 
random numbers between 0 and 1. To make a measurement, the transmittance, P tJ , of each 
aperture element is controlled to equal the value of the corresponding entry in a row of the 
sensing matrix. The sensor integrates all rays transmitted through the aperture assembly. The 
intensity of the rays is modulated by the transmittances before they are integrated. Therefore, 
each measurement from the sensor is the integration of the intensity of rays through the aperture 
assembly multiplied by the transmittance of respective aperture element. A measurement from 
the sensor is hence a projection of the image onto the row of the sensing matrix. By changing the 
pattern of the transmittance of the aperture assembly, the compressive measurement is captured 
corresponding to a given sensing matrix. 


III. Mathematical Formulation 

In this section, we formally define what an image is in the proposed architecture and how it 
is related to the measurements from the sensor. In particular, we will describe how a pixelized 
image can be reconstructed from the measurements taken from the sensor. 
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A. Virtual Image on the Aperture Assembly 

The analog scene I(u,v ) can be defined on any plane between the scene and the sensor and 
for convenience, we here define the image on the aperture assembly. Considering one point on 
the aperture assembly, there is a ray starting from a point on the scene, passing through the point 
(u,v) on the aperture assembly, and ending at the sensor. Let r(u,v;t ) denote the intensity of 
this ray arriving at the sensor, passing through the aperture assembly (u, v ) at time t. The image 
point I(u,v) can be defined by the integration of the ray in a time interval At 

rAt 

I (u, v) = / r(u,v;t)dt. (2) 

Jo 

It is worth noting that I(u, v ) is continuously defined in the region of the aperture assembly and 
can be considered as an analog image. 

Similarly, define the continuous transmittance pattern of the aperture assembly as P(u, v ), The 
measurement collected by the sensor is the integration of the rays through the aperture assembly, 
modulated by the transmittance pattern P(u,v), 

z — JJ I(u,v)P(u,v)dudv. (3) 

Equation ([3]) defines the measurement of the sensor based on the continuous image in (|2]). In 
the following, we analyze how the pixel is defined and then we can get the discretized image. 


B. Pixelized Image 

Since only a single sensor is used in our system, the virtual image defined in Q can be 


pixelized by the aperture assembly, which is similar to the single-pixel camera [25]. Considering 
each element of the aperture assembly of size A u x A„, each pixel value of image can be 
represented by the integration of all the rays passing through the aperture element 

rj^v 


I(iJ) = 


f'i&.u 


' (i- 1)A U J (j—l)A v 


I(u, v)dudv. 


(4) 


Image {I(i, can be vectorized to a long vector I e M ;V , which is the target signal x in 

the compressive sensing model (|T|) and the image size is of N x x N y . 
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C. Compressive Measurement 


When the aperture assembly is programmed to implement a compressive sensing matrix, the 
transmittance P(i,j) of each aperture element is controlled to equal the value of the corre¬ 
sponding entry in the sensing matrix. For the m-th measurement, the entries in row m of the 
sensing matrix are used to program the transmittance of the aperture elements. Specifically, let 
the sensing matrix A be a matrix whose entries, a l 3 , are random numbers between 0 and 1. Let 
P m {i,j) be the transmittance of aperture element (i,j) for the m-th measurement. Following 
0. the m-th measurement can be represented as 


Zm 



N X Ny 

P m (u,v)I(u,v)dudv = EE I(i,j)P m (i,j), 

i= 1 7=1 


(5) 


where we consider that each aperture element, inside the region [(z—1)A U , iA u ] x [(j—l)A v ,jA v ], 
P m {u,v ) = P r "(i.j), is a constant programmed by the user. Let x denote the vectorized 
formulation of /, and then we have z m = J2 n a mtn x n . After taking M measurements, in the 
noiseless case, we can write the sensing process as 


z = Ax, (6) 

which is now the compressive sensing formulation as in (JTj) (if the noise is considered). Then, 
our problem becomes that given A (designed and known a priori ) and the measurements z, how 
to reconstruct the image x. A new algorithm, which explores the sparsity of the local region 
in a transformation domain, e.g., the DCT (Discrete Cosine Transformation) used in the JPEG 
compression, is proposed below to achieve the state-of-the-art reconstruction. 


IV. Reconstruction Algorithm 

The theory developed for compressive sensing 0 requires that the target signal x is sparse and 
following this, researchers have extended the sparsity of the signal in a transformation domain. 
For example, the wavelet transformation [5j is generally used in the image reconstruction of 
compressive sensing. Since the wavelet is usually imposed globally on the entire image, we 


term this as global transformation (the Total Variation used in [37] is also performed globally). 
Under these transformations, the coefficients usually have the same (or similar) number of the 
image pixels and these coefficients are approximately sparse (or compressible [|5]). A variety 
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of algorithms (see references in f38| ) have been developed to explore the sparsity of these 
coefficients. Let T denote the basis of transformation, the signal x can be represented as 


x = TO, (7) 

where 6 denotes the coefficients in the transformation domain. Plug ([7]) into ([!]), we have 

y = AT e = R0. (8) 


Most algorithms have been developed (for example [39], [40]) to solve: 


min ||0||i, subject to y = R 6, 


(9) 


or the variates of this problem with || ■ ||i denoting the l\ norm. Advantages of these algorithms 
include fast computation and low memory cost by assuming that T is easily invertible. When T is 
an orthonormal matrix, such as the wavelet transformation, solving 6 in (|9]) is equivalent to solve 
x. However, in other cases as stated below, there may be no one by one correspondence between 
x and 6. Recently, researchers have found that by exploiting the local (region) sparsity of the 
image can achieve better results than the global transformation methods, examples including the 


low-rank regularize!' [41] and the denoising based method [38]. 


A. Exploring the Local Sparsity 


Inspired by the JPEG compression and the emerging dictionary learning algorithms [42 [ for 
local patches, researchers have developed algorithms based on the sparsity of the local patches 
in specific basis or learned dictionaries [43]]. Consider a general case, in which T in ([8]) is now 
not a linear independent basis, but a more general dictionary, D e W xp and usually p PP q. 
Let X e q x N p denote the patch formulation of the image x, with n denoting the vectorized 
patch length (e.g., a <Jq x two dimensional patch) and N p symbolizing the number of 
patches extracted from the image x. We can write X = Qx, where Q denotes an extraction and 
permutation matrix. Under the dictionary D, 


X = DS, 


GO) 


where S e W xNp is a matrix whose columns are the coefficients of each patch and it is usually 
sparse. Recall the key of compressive sensing is to find a sparse basis and with the formulation 
in ( [TO] ), D is the basis and S plays the role of sparsity. 
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B. Formulation of the Reconstruction 

By adapting the above formulation, the compressive sensing problem is no longer the same 
as in ([9]), but can be formulated as an iterative two-step procedure: 

• Step 1: To minimize the following objection function 

J(x) = \\y - Ax\\ 2 2 . (11) 


Step 2: To solve the following minimization problem 

subject to X = DS. (12) 

It is worth noting that in step 1, we don’t need x to be sparse, but when A is a compressive 


min IISlli, 
s 


sensing matrix, a regulizer term is needed, e.g., the TV used in [44]. An alternative solution is 


to use the majorization-minimization (MM) approach [45], which will be described below. In 


step 2, when D is given, (12) is the conventional sparse coding problem (or, for each column 
of S, it is compressive sensing problem). 


C. Proposed Algorithm: SLOPE 

Different from the formulation in ([9]), for which diverse algorithms have been developed to 
solve the unique 6, thus to obtain x, in the above formulation, we aim to get x directly and this 
is also the final target of reconstruction. Therefore, step 2 can be recognized as a denoising step, 


while step 1 aims to update x. In this paper, we solve step 1 with the Euclidean projection [35], 
which, under the condition of the sensing matrix (Hadamard matrix) used in our camera, is same 
as the iterative shrinkage/thresholding (1ST) derived from the MM. However, we prove that a 
larger range of the step-size (than the 1ST) still leads to good convergence. 

1) Update x k : Under the compressive sensing framework, < p~T] ) has a solution in closed form, 
which is to use pseudo-inversion x = A T (AA T ) -1 y. By using the MM approach to minimize 
J(x), we can avoid solving a system of linear equations. At each iteration k of the MM approach, 
we should find a function G k {x) that coincides with J{x) at x k but otherwise upper-bounds 
J{x). We should choose a majorizer G k {x ) which can be minimized more easily (without having 
to solve a system of equations). The G k (x) is defined as 

G k (x) = ||y - Aas||| + (x - x k ) T {j]I - A T A)(:r - x k ), (13) 
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where I denotes the identity matrix and r/ must be chosen to be equal to or greater than 
the maximum eigenvalue of A T A. For the Hadamard sensing matrix used in our camera, the 
maximum eigenvalue of A T A is easily obtained (77 > 1). The update equation of x k is given 
by: 

x k+1 = x k + -A T (y - Ax k ). (14) 

V 

The GAP algorithm, proposed in p5|, which has been demonstrated high performance in video 


compressive sensing [ 11 ], has the following update equation: 

**+1 = x k + A t (AA T )~ 1 (y - Ax k ) 


(15) 


Under some condition of the sensing matrix A, as the Hadamard matrix used in our system, 
AA T is the identity matrix and thus (15) is same as ( fl~4| ) with 77 = 1. 

Based on the above two methods, we propose a more general update equation for x k : 


x k+l = x k + £A (AA ) 1 (y - Ax 


(16) 


where £ is the step-size and we provide the step-size selection with convergence guarantee in 
Theorem [H 

2) Denoising: Next, we consider the problem in ( [T2| ), which can be recognized as a denoising 
problem, and different from the previous algorithms [ |45| , the denoising is now performed on the 
patches of the image obtained by ( fl4| ). Though various algorithms has been developed for this 
denoising algorithm and the dictionary learning approach has achieved state-of-the-art results, 
the computational cost is always high and thus not efficient. On the other hand, the patches 


based transformation method [46] can provide excellent results efficiently. Inspired by this, and 
the success of JPEG compression, we here propose that, instead of learning a new dictionary, 
the DCT transformation will be used on the overlapping local patches for denoising. We will 
demonstrate that, by using this and one more clustering step on these patches, we achieve better 
results than the advanced algorithm in [380 The efficiency of the transformation based denoising 
compared with the dictionary learning algorithm is the fast inverse transformation. It is worth 


"The DAMP algorithm [381 may provide better results when the sensing matrix is Gaussian. However, here we focus on the 
Hadamard sensing matrix implemented in our system, which is realistic. 
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noting that when the transformation is performed on the onverlapping patches, the formulation 
in (|9]) does not fit anymore and thus we are using (12) here, which can be reformulated as: 


x = WTck, 


(17) 


where W is the average matrix for the same pixels in different overlapping patches and T is 
the inverse transformation matrix and ol is thus the vector of coefficients based on local patches. 
Note that though WT is a fat matrix (more columns than rows), its pseudo-inverse can be 
computed efficiently; we can easily obtain x from ol and vice verse. This is the key that the 
proposed algorithm is efficient. 

3) Clustering Patches: When better results are desired, we can achieve sparser representation 
of the image by clustering the patches into different groups, and in each group, we can perform 
a 3D transformation ( e.g ., 2D DCT in space and a wavelet on the 3rd dimension), which can 
be represented as 

* (c) = W(T 3 ® T 2 <g> Ti)a (c \ (18) 


where (c ' } denotes the c-th cluster and Ti, TV T 3 symbolize the transformation bases in the first, 
second and third dimension, respectively. a (c > is coefficients of patches in the c-th cluster This 
clustering procedure can be implemented by A-means or block matching approaches. Note that 


(18) is invertible; we can easily obtain x' 0 ' 1 from (y} c) and vice verse. 

The next step is to perform denoising (shrinkage) on ol, which can be done using the soft- 
thresholding pOt . However, how to select the threshold is always a problem and usually a 
cross-validation is required. For the algorithm proposed here, the thresholding is performed on 
each cluster and thus is more challenging. We propose an efficient way to select the threshold 
below. 


4) Determination of the Threshold: The GAP algorithm [35] enjoys the anytime property 
by using a particular way to threshold the coefficients. The basic idea to keep the non-zero 
coefficients as the same (or related) number of the measurement. When the orthonormal trans¬ 
formation is used, the coefficients have the same dimension of x. However, in our case, when 
the overlapping patches is used, there are far more coefficients than the dimension of x, which 
is implicitly represented by W in ( [XT] ). Therefore, we extend the method in [ f35| by keeping the 
same compressive sensing ratio for each cluster in the coefficients. Specifically, considering the 
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compressive sensing ratio (CSr) defined by 

number of row in A 

— 


(19) 


number of column in A 
we keep the non-zero number of coefficients in each cluster in proportion to (CSrxthe total 
number of coefficients in this cluster). We have found that this is very efficient in both simulation 
and real datasets. This can also be seen as an adaptive threshold \^ } imposed on the coefficients 
for each cluster at every iteration: 

Pk ] = « 


(c) _ 

l © max 


A c ) 


a 


(c) 


,0 


( 20 ) 


which is a shrinkage/thresholding operation [47] and 0 denotes the element-wise (Hadamard) 


product; k symbolizes the iteration and (0 signifies the cluster number. This implies that 


45 = 


a 


(c) 

k,i 


\ ( c ) 

1 _ A fc 

i«gi 


if 


o, 


I Ac) I > \( c ) 

Ffc.il ^ A fc > 


otherwise, 


( 21 ) 


(c) (c) 

where a k i is the /-1h element of ct k ,\/i e c-th cluster. The method described above provides 


an efficient way to select A 


(c) 


D. Summary of SLOPE 


The optimization problem investigated here based on local overlapping patches in (18) as well 
as the global transformation based approach in (|9]) can be summarized as: 


min||a||i, subject to y = AElct, 


( 22 ) 


CX 

where EL symbolizes the transformation or basis and x = ELct. For the proposed SLOPE 
algorithm, the average matrix W is also manifested in this EL. 

The proposed algorithm can be summarized as an iterative two-step approach: 


^k-\-1 

= x k + £A t (AA t ) 1 (y- Ax k ), 

(23) 


AA = 1 x k + £A T (y - Ax k ), 


(24) 

M5 

a fc } © max /1 |J c )|’°j 

1' 

(25) 


where 
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(c) 

ct k is obtained from x k via the transformation on overlapping patches in each cluster as 


shown in (18). 


x k is obtained from {/3^} by transforming {} back to the image domain. That is, x k 


(c)i 


is obtained from (18) by replacing ol by (3. 


Since our algorithm is based on the shrinkage the coefficients of local overlapping patches, we 
term it as SLOPE (Shrinkage of Local Overlapping Patches Estimator), which is summarized in 
Algorithm |T] and the ‘local’ here denotes that the transformation is performed on local patches, 
rather than the global transformation performed on the entire image, e.g., the wavelet. 


Algorithm 1 SLOPE 


Require: Measurements y, sensing matrix A, and 
l: Initial x 0 = A T (AA T )“ 1 ?/. 

2 : for k = 1 to Max-Iter do 

3: 

4: 

5: 

6 : 

7: 


Update x k by Eq 
Extract the overlapping patches from x k . 

Cluster patches into C clusters based on similarity (if necessary). 

Obtain the coefficients of each cluster (a^}^ =1 by imposing the transformation on the 
patches in the same cluster. 

Obtain the shrinked coefficients via the shrinkage/thresholding operation in (25). 


8 : Update x k by transforming {0 C) } ( <L\ back to the image (pixel) domain. 

9: end for 


When we write the optimization problem as in (22), it seems the same as (|9]). However, 


significant difference exists when we solve them. For the problem in (22), existed algorithms 
usually solve the coefficients, 6 , directly, instead of the desired signal x, as they assume there is 
one-by-one correspondence (each coefficient contributes equally to the signal) between 6 and x 
(since wavelet is usually used). However, when the overlapping patches are used, it is different 
to update x as in ( [23] ) from updating 0, as each coefficient is weighted differently (one pixel 
corresponds to several different coefficients). We unveil the difference below. Consider updating 
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G directly, and the solution to the first step ( p~T| ) will be 

Gk+i = d k + £R T (RR T )-\y-RG k ). 


(26) 


Note that R is a matrix including the sensing matrix A, the pixel averaging matrix W and the 
transformation matrix (or the dictionary); R = AWT. This leads to that RR T is not an identity 
matrix and it is not easy to explicitly write this matrix. Therefore, updating x directly as in ( |23| ) 
is more straightforward and saves a lot of computational workload as well as memory. On the 
other hand, because of the overlapping patches, the coefficients are not equally important and 
W imposes weights for each coefficient. Left-multiplying WT on 


will lead to ( [23] ): 

WT0 fc+ i = WT0fc + (WT(AWT) t ((AWT)(AWT) t ) _1 (i/ — AWT0^), (27) 

1 “1 “ (WW t (WW t ) _1 A t (|/ - x k ) = x k + £A T (y - Ax k ), (28) 


where we have used TT t = I and AA T = I and note that WW T is a diagonal matrix [411 
with each diagonal element in (0,1]. 

If (RR T ) -1 is not used as in (26), it will be 

= G k -\-£,R T {y — RGk)■ 


G 


fc+i 


(29) 


This will bias the solution of G, since it treats each coefficient equally [481. After several 
iterations, the error will be accumulated and therefore the results are not as good as updating x 


directly. To see this explicitly, left-multiplying WT on both sides of (29), we have 

WT 0 k+1 = WTG k +^WT(AWT) J (y-AWTG k ). 

x k+1 = x k + (WW t A t (i; - Ax k ), 


(30) 

(31) 


where we have used TT t = I and we can see from < f3~f[ ) that since WW T is not an identity 
matrix and thus it is different from ( |24| ). W plays the role of weighting each coefficient for 
the overlapping patches, since each pixel belongs to different patches. When the coefficients 
G are transformed back to pixels, each pixel should have equal weight (since they are equally 
important) and therefore, (RR T ) _1 in (26) balances this importance. However, when (29) is used, 
each pixel will have a different weight as in f3lj ) (because of WW T ) and the reconstruction 
error is thus introduced. 
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E. Convergence of SLOPE 

We now prove that under the lensless compressive imaging case considered in our system, 
AA T = I, SPLOE is an anytime algorithm; the solution sequence {x k }f =l is monotonically 
converging to the true signal, by selecting the proper X k at each iteration with a certain range 
of f. 

Consider the true image is x* and 

x* = ELct*, (32) 


with a* denoting the (true) sparse coefficients in the transformation domain. We need the 
following conditions to prove the anytime property of SLOPE: 

a) Initialization with 

x 0 = A T (AA T )- 1 y AA =~ 1 A T y, (33) 

b) For k -th iteration, select A/, such that 


ll/3fcl|i > II 


(34) 


where || ■ ||i denotes the Aj-norm, the summation of absolute values of each entry, 

c) Consider that in each iteration, X k is selected to keep at most rrif nonzero elements in /3 k , 
with m* Xk < N c , where N c is the number of coefficients (the dimension of ol or (3; it is 
much larger than N, the dimension of x, when the overlapping patch is used). We need 


the RIP (restricted isometry property) condition [23J, [49], [50] on R = AWT such that 

0 < S m * _| -k* < 1, (35) 

A fc 

where K* is the number of non-zero elements in at* and mt > K*. 

— 

To see the intuition behind (|33])-(|34|), we consider y = Ax* as a line (manifold) and x 0 is 
initialized by touching the line with a large (A ball (formed by a) (because y = Ax 0 ). At each 


iteration, we shrink the (A-ball by the operation in (25) but with the condition that this (A-ball 
formed by j3 k is larger than the true (A-ball formed by cC. Eventually, in the ideal case, the A)-ball 
formed by (3 k will be the same as the true A,-ball formed by ct*\ thus x k recovers the true signal 


x*. From the initialization in ( [33] ), we have y = Ax 0 = Ancto. Since y = Aa? 0 = A not* and 
a* is a minimizes we have || okq || i > || Q: *l|i’ an( f thus, there exists Aq such that ||/3 0 ||i > ||a*||i. 
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The condition in (34) can be relaxed to that, for each step, ||a fc ||i > ||a*|| 1 , so that X k exists to 


hold the anytime convergence of SLOPE. Please refer to [51] for an alternative interpretation. 
Theorem 1: Let R satisfy RIP, i. e. , there exists 0 < S* < N c and 0 < 5 s- < 1 such that 

(1 - 5 s *)|| cv s ||2 < ||R«a ! s ||2 < (1 + MIKIIL (36) 

for all 0 < s < S*, where ck s G M s , R s G M Mxs is formed from columns of R. Let a* be the 


true solution of (22) and its sparsity is K*. Let be the sparsity of (3 k given by (1251). If there 


(37) 


exists a sequence X k > 0 , such that 

||/3Ji > ||o£*||i, and m\ k + K* < S*, 

then ct k from Algorithm [I] (initialized by (33)) monotonically converges to a* for all £ G (0, 2); 
Algorithm |T| is an anytime algorithm. 

Proof: The full proof is presented in the Appendix. Here we review the main steps. Recall 
that the true signal is x* and the algorithm provides a sequence of solution x k at each iteration. 
We prove in the Appendix that: 

1) when £ G (0,2], \\x k — x*\\\ (||o£fe — a*|||) monotonically non-increases; 

2 ) when £ e (0,2), \\x k — x*\\\ (||— «*||1) monotonically decreases and converges to a 
constant. 

3) when £ e (0,2), with the RIP condition on R, \\x k — x*\\\ (||cKfc — a*|||) monotonically 
converges to zero. 


It is worth noting that the algorithm in (23)-(25) has the same formulation of the iterative 


shrinkage/thresholding algorithm (ISTA) [ |40| , [47] under the condition AA T = I. However, we 
have proved that when the step size £ G (0,2), ISTA is an anytime algorithm if the initialization 
and thresholds are selected as mentioned in our algorithm. 


Corollary 1: When A A = I, the ISTA used to solve the optimization problem in (22) is an 


anytime algorithm if it is initialized using (33) and the threshold for each step satisfies the same 
condition as SLOPE with the step size £ G (0,2). 

Proof: The proof follows Theorem |T] ■ 

Remarks: 
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Unlike the ISTA algorithm derived from the MM approach in ( |T4| ). which needs // > 
maxeig(A T A) (which equals the step size £ = ^), we only need the step size £ G (0,2). 
Usually, £ > 1 is used for fast convergence. 


The GAP algorithm proposed in [35] can be seen as a special case of our algorithm with 


£ = 1 as in (15). On the other hand, GAP is developed based on weighted group £ 2 ,i norm, 
and it does not require AA T = ifl 


In order to select the appropriate X k at each iteration, the method proposed in [35J can still 
be used. 


ct k = sort(| cx k |, ‘descend’), 

X k Q?fc,m* +1) 


(38) 

(39) 


where a k , m *+i is the (m* + l)-th entry of ct k , which sorts the absolute values of ct k from 
large to small. Then the sparsity of j3 k generated by Algorithm [I] will be m* = m* Xk in 
Theorem [I] We found in the experiments that setting rn* G [0.5iV c , N c — 1] always provides 


good results. Similar selection approach can also be found in [511, where the generalized-RIP 
is introduced and the adaptively iterative thresholding algorithm are proved to be converged 
linearly. 


Equation (34) is a sufficient (thus restricted) condition. Even we select a larger X k , the 
algorithm may still converge well. For instance, if we can select X k such that the support 


of (3 k ( J + in (53)) includes the support of a* (Z + in (52)) in each iteration, SLOPE will 


monotonically converge to zero. 

SLOPE explores the sparsity of local patches, while conventional compressive sensing 
inversion algorithms are often developed based on the sparsity of wavelet coefficients. 
However, the wavelet coefficients are usually not sparse, but compressible |5j. On the 
other hand, the DCT coefficients for overlapping patches are sparser than the wavelet 
coefficients, as the number of coefficients N c is much larger than the number of wavelet 
coefficients (recall that R G Similar case exists for the patch-based dictionary 


learning model [52] where the sparsity is imposed on coefficients. When the patch is small, 
usually, only a DC coefficient is sufficient to represent a single local patch. Therefore, it 


’Our proof can also be extended to the case which does not need AA T = I. 
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is more reasonable to define the sparse level K in (35) on the coefficients of overlapping 


patches. Similarly, since N c 'A> N, the selection of m* (\ k ) has a large degree of freedom. In 
addition, our theorem is not limited by the local patch based model; it also fits the wavelet 
transformation based algorithms. 

In the noisy case, SLOPE can also be used. It is worth noting that we only impose that j3 k is 
sparse, rather than « k . Therefore, the different between Ax k and Ax k can provide a good 


estimate of the (measurement) noise [35]. Experimental results on real data in Section VI 


verify the robustness of SLOPE under the noisy case. 

F Relation to ADMM 

The Alternating Direction Method of Multipliers (ADMM) algorithm [53] provides an alter¬ 
native solution to a lot of optimization problems. When the ADMM is utilized in our problem, 
the difference of ADMM compared with 1ST, GAP and SLOPE lies in how to update x as stated 


in Section |l V-C1 [ Under the ADMM formulation, introducing regulizers (6, c}, the cost function 

with (x = Wet). (40) 


of (22) is: 


C(x,x,a,b,c) = ^||y-A *||2 + ^\\x - x\\l + c||a|| 


ADMM cyclically solves the following subproblems: 


x k+1 \= argmin^||y - Aar||| + ^\\x - x k "' 2 

x Z Z 

x k+1 := argmin^llccfc+i - x\\j + c||a||i. 

X Z 


2 > 


(41) 

(42) 


While (42) can be solved using the same shrinkage/thresholding approach as described in 


Section |IV-C2[ we here focus on the update of x, to solve ( |4lj ). Given x k , ( |4T] ) is a quadratic 
optimization problem and x can be simplified to: 

(A T A + 61) a; = Ay + bx, (43) 

which admits the following closed-form solution: 

x k+1 = (A T A + biy 1 (A T y + bx k ). (44) 

Since in the CS framework, A is a fat matrix, (A T A + 61) will be a large matrix and thus the 
matrix inversion formula can be used to simplify the problem: 

x k+1 = [6 -i I - 6 -1 A T (I + A6 _1 A T ) _1 A6 _1 ] [A y + bx k ], (45) 
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In our case considered in the real system AA T = I, 


x k+ i x k + 


A (y — Ax k 
6 + 1 


(46) 


which is same as (|T5j) if b = ()(£ = 1). Comparing the update equation of x in (46) of ADMM 


with the update rule of our SLOPE algorithm in (24), we observe that: 


. The ADMM formulation of updating x is a special case of SLOPE with £ = y4_. 

Since b is usually selected to be a small number, the ADMM update rule is very similar to the 
case £ = 1. Therefore, the anytime property of SLOPE still holds if the ADMM updating rule 
is adopted. 


ground truth 



TVAL3, psnr: 23.2817 




GAP, psnr: 23.3256 



Proposed, PSNR: 25.1467 


DAMP, psnr: 24.8019 sHM, psnr: 23.1146 



Fig. 2: Simulation: reconstruction results of different algorithms at CSr= 0.1, image size 256 x 
256. 


V. Simulation Results 


To verify the performance of the proposed algorithm, we conduct SLOPE on some simulation 
datasets, which is summarized in Table [TJ Different from the simulation conducted in previous pa¬ 


pers [38], the sensing matrix used in our work is the permuted Hadamard matrix as implemented 
in our hardware. Therefore, the reported results may be different from them in other papers. The 
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TABLE I: Simulation: reconstruction PSNR (dB) of different images with diverse algorithms at 
various CSr. SLOPE is the proposed algorithm. 


Image 

CSr 

TVAL3 

GAP 

sHM 

DAMP 

SLOPE 


0.02 

16.91 

19.19 

8.28 

18.47 

20.06 



0.04 

19.37 

20.99 

18.37 

20.44 

21.72 



0.06 

21.48 

22.25 

20.90 

22.27 

22.69 



0.08 

22.84 

23.08 

21.46 

23.85 

24.42 


0.1 

23.42 

23.74 

23.34 

25.04 

25.15 


0.02 

17.39 

20.06 

9.28 

19.40 

20.85 

llt^y 


0.04 

20.01 

21.73 

19.24 

21.98 

22.63 

Irf' |f 


0.06 

22.23 

22.97 

21.59 

23.89 

24.76 

Vi 

1 

0.08 

22.76 

23.88 

22.47 

25.17 

25.60 


0.1 

23.79 

24.61 

24.13 

26.19 

26.37 


0.02 

14.99 

17.70 

9.85 

15.97 

18.37 



0.04 

17.04 

19.84 

17.23 

18.99 

20.45 





0.06 

18.99 

21.09 

18.98 

21.83 

23.66 

Klmm-iii.; . 1 




0.08 

20.12 

21.98 

20.80 

23.80 

24.55 




0.1 

20.82 

22.54 

21.44 

25.18 

26.41 



0.02 

16.80 

18.83 

9.70 

17.82 

20.00 

}JK3y 


0.04 

19.44 

20.85 

18.60 

20.21 

21.85 



0.06 

21.27 

22.25 

20.83 

22.10 

23.07 



0.08 

22.53 

23.31 

22.71 

23.69 

24.03 


0.1 

23.13 

24.19 

23.81 

25.33 

25.88 


proposed SLOPE algorithm is compared with the following four algorithms: 1) TVAL3 [;37j, 2) 
GAP based on wavelet [[35|, 3) DAMP [38] with BM3D denoising, and 4) sHM by exploiting 


the tree structure in wavelet Since when CSr= 0.1, very good results have been achieved 
for most images (Figure [2]), we here spend more efforts on the extremely low CSr, in particular 
CSr< 0.1. For all the simulated images used here, we resize them to size 256 x 256. For the 
RGB image, we use R, G, and B sensors to sample each channel separately. One example is 
shown in Figure [3j It can be observed that DAMP over-smooths the image while the proposed 
algorithm reserves more details. Different types of artifacts exist in other algorithms. Regarding 
the computation time, for each iteration, our algorithm takes about 0.28 seconds (at CSr = 0.1), 
which is similar to TVAL3 and GAP, and we found that 50 iterations are sufficient to present 
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ground truth 


TVAL3, psnr: 19.4449 



GAP, psnr: 20.8482 



Proposed, PSNR: 21.8486 



DAMP, psnr: 20.2148 



sHM, psnr: 18.5955 



Fig. 3: Simulation: reconstruction results with different algorithms at CSr= 0.04 (4% of the total 
pixel number, 256 x 256 x 3). 


decent results. One iteration in DAMP takes longer than our algorithm, about 2.83 seconds for 
the 256 x 256 x 3 RGB image. The patch size of 8 x 8 are used for all experiments and £ = 1.5 
is employed as the step size. 

A. Anytime Verification 

We next verify the anytime property of the proposed SLOPE algorithm by considering the 
“Lena” image of size 512 x 512. Similarly, the permuted Hadamard matrix is used. We test 
four values of the step size £ = {0.5,1,1.5, 2} by setting CSr = 0.05. The results are shown in 
Figure [4] The reconstruction errors and PSNRs at each iteration with different step-size are plotted 
in Figure [5] It can be seen that the reconstruction errors are decreasing monotonically for each 
iteration while the PSNRs are increasing for each iteration (especially when £ = {0.5,1,1.5}). 
Furthermore, we observe that a larger £ leads to faster convergence. In addition, a larger £ usually 
needs a larger m* to select the X k , thus to ensure the anytime convergence of SLOPE. We have 
observed in our experiments that when £ = 2, the PSNR of the reconstructed image sometimes 
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{ = 2, PSNR: 29.2331 £ = 1.5, PSNR: 29.2259 



Fig. 4: Simulation: reconstruction images with different step size for the 512 x 512 image. 
Results are obtained via running the proposed algorithm 100 iterations. CSr = 0.05. 




Fig. 5: Reconstruction error (left) and PSNR (right) with different step size for the 512 x 512 
image. CSr = 0.05. 
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does not increase monotonically, which is consistent with the range of step-size in Theorem [T] 


JPEG Compression, Quality 100, PSNR: 58.47 



JPEG Compression, Quality 100, PSNR: 58.47 



JPEG Compression, Quality 100, PSNR: 58.47 



JPEG Compression, Quality 1, PSNR: 22.0687 



JPEG Compression, Quality 20, PSNR: 29.3411 



Proposed, CSr: 0.0334, PSNR: 21.6318 



Proposed, CSr: 0.0460, PSNR: 22.5605 



Proposed, CSr: 0.10163, PSNR: 25.147 



Fig. 6: Example images: JPEG compared with the proposed SLOPE algorithm. 


B. Compare with JPEG Compression 

We now compare SLOPE under the compressive sensing framework with the JPEG compres¬ 
sion, which is based on the sparsity of the DCT coefficients in 8 x 8 blocks. We first use a PNG 
file as the truth and then use the script within MATLAB “imwrite(-)” by choosing 8-bits ‘jpeg’ 
compression with different qualities (100 denotes the highest quality). We treat the quality 100 
as the standard full file size. For the ‘Barbara’ image we used here, PSNR = 58.47dB (w.r.t. 
the PNG file) and the file size is 45.6KB at quality 100. The compressed image is obtained by 
changing the compression quality from 1 to 100 and we compare the file size with the full size 
at quality 100, computing the CSr used in this paper. 
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TABLE II: JPEG compression at different qualities compared with the proposed compressive 
sensing recovery 


JPEG Compression 

SLOPE Reconstruction 

Difference 

Quality 

Size (bytes) 

PSNR (dB) 

CSr 

PSNR (dB) 

PSNR (dB) 

i 

1,563 

22.0683 

0.0334 

21.6318 

0.4365 

3 

1,716 

22.7278 

0.0367 

21.8601 

0.8677 

5 

2,153 

24.4160 

0.0460 

22.5605 

1.8555 

7 

2,559 

26.0974 

0.0547 

23.1834 

2.9140 

9 

2,946 

26.9623 

0.0630 

23.5511 

3.4112 

10 

3.141 

27.2853 

0.0672 

23.8445 

3.4408 

12 

3.494 

27.8379 

0.0747 

24.1288 

3.7091 

14 

3,815 

28.2981 

0.0816 

24.4446 

3.8535 

16 

4,160 

28.6912 

0.0890 

24.7085 

3.9827 

18 

4,478 

29.0306 

0.0958 

24.9645 

4.0661 

20 

4,752 

29.3411 

0.1016 

25.1470 

4.1941 


Table [TT] summarizes the results of JPEG compression compared with the results obtained 
by our algorithm. This is a rough, high level comparison because JPEG also performs an 
entropy encoding after the DCT transform and quantization, while in our method, the number 
of compressive measurements is compared with the number of total pixels, and we did not 
consider the entropy coding on quantized measurements. When the compression is high (lower 
CSr), the gap between our approach and the JPEG compression is very small (< 0.5dB). 
When the compression gets lower, the gap becomes larger. One possible reason is that when 
JPEG is performed on the image, the truth is available and it is very easy to capture useful 
information from the truth. However, under the compressive sensing framework and using the 
current algorithm, increasing a few number of measurements can help the reconstruction, but 
not that significantly. Example images can be found in Figure [6} It can be seen that JPEG 
compression has obvious block artifacts while the results of the proposed algorithm become 
better progressively with increasing number of measurements. 

Furthermore, in JPEG compression, if we lose some bits, we may not be able to decode entire 
blocks. By contrast, in our compressive sensing framework, if we lose some measurements, we 
can still reconstruct the image, maybe not at a high fidelity. 
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Fig. 7: Prototype device. Left: the first generation in [31]; 


rigm: me current 




VI. Experimental Hardware and Real Data Results 


We have implemented our camera (Figure |7j) by using a transparent LCD monitor (with HD 
resolution |j] as the aperture assembly, which can be programmed by using a programing code 
implemented on a computer. One advantage of using the LCD monitor is that we can control the 
image resolution by merging the neighbor pixels into the same aperture element. We implemented 
our control code in C++ to program the LCD. The sensor used in our work is the same as in [310 
There are some practical issues when we process the real data, especially on the calibration of 
the sensing matrix A. Specifically, for the binary Hadamard matrix employed in our camera, we 
use the (0,1} entries, denoted as A + (as the ideal case), which differs the { — 1,1} (normalized 
to a constant) as generally used Hadamard matrix (since —1 can not be implemented in the 
hardware in a single step). Let A° denote the Hadamard matrix consisted of {—1,1}: 


A + 


A 0 + 1 

2 


(47) 


where 1 denotes the all-one matrix (every entry is 1) with the same size of A 0 . In the real 
system, however, even we assume that each element of the aperture assemble is uniform (the 
transmission rate is the same), there is still some light transmitting the aperture elements where 
we set to zero. Therefore, for the (i,j )~th element of the real sensing matrix A 


aKi if = 

/ if A+j = 0, 


(48) 


4 http://cry stal-display.com/products/transparent-lcd/ 

5 http://www.digikey.com/catalog/en/partgroup/tsl2571-evaluation-module/33050 
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where g is a constant to normalize the intensity of the light when the aperture element is 
programmed to one and / is a constant denoting the intensity of the light when the aperture 
element is programmed to zero. Fortunately, the fast Hadamard transformation can also be used 


by integrating (47) and (48). Moreover, g and / in (48) can be normalized to a constant. In 
all the real data, we set the step size £ = 1.5. These results demonstrate the robustness of the 
SLOPE algorithm in the noisy case, since noise always exists in the real measurements. 


til til 11 Mi til til til til lii 

CSr: 0.1, iter: 1 CSr: 0.1, iter: 2 CSr: 0.1, iter: 3 CSr: 0.1, iter: 4 CSr: 0.1, iter: 5 CSr: 0.1, iter: 6 CSr: 0.1, iter: 7 CSr: 0.1, iter: 8 CSr: 0.1, iter: 9 CSr: 0.1, iter: 10 

4 J4 kill j4 jJ J'4 J'4 J'4 J 

CSr: 0.15, iter: 1 CSr: 0.15, iter: 2 CSr: 0.15, iter: 3 CSr: 0.15, iter: 4 CSr: 0.15, iter: 5 CSr: 0.15, iter: 6 CSr: 0.15, iter: 7 CSr: 0.15, iter: 8 CSr: 0.15, iter: 9 CSr: 0.15, iter: 10 

CSr: 0.2, iter: 1 CSr: 0.2, iter: 2 CSr: 0.2, iter: 3 CSr: 0.2, iter: 4 CSr: 0.2, iter: 5 CSr: 0.2, iter: 6 CSr: 0.2, iter: 7 CSr: 0. 2, iter: 8 CSr: 0.2, iter: 9 CSr: 0.2, iter: 10 



CSr: 0.25, iter: 1 CSr: 0.25, iter: 2 CSr: 0.25, iter: 3 CSr: 0.25, iter: 4 CSr: 0.25, iter: 5 CSr: 0.25, iter: 6 CSr: 0.25, iter: 7 CSr: 0.25, iter: 8 CSr: 0.25, iter: 9 CSr: 0.25, iter: 10 



Fig. 8: Real data: reconstruction results of different CSr (each row) at each iteration (each 
column). The image is of size 64 x 64. 


A. Anytime Verification by Real Data 

We first consider the gray scale sensor and the image resolution of 64 x 64 to verify the 
anytime property of SLOPE. To capture compressive measurements, we use a sensing matrix 
which is constructed from rows of a Hadamard matrix of order N = 2 12 . Each row of the 
Hadamard matrix is permuted according to a predetermined random permutation. The scene is 
composed of a photo (of two persons) printed on a paper and we capture the measurements of 
this photo. Reconstruction results at various CSr using SLOPE are shown in Figure [8} For each 
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row, we plot results at different iterations. It can be seen that the results are getting better with 
increasing iterations (from left to right); thus, anytime. We further notice that only using 2 or 3 
iterations (the second and third column), SLOPE can provide descent results. This verifies the 
performance of the proposed algorithm. Moreover, our camera along with SLOPE can present 
very good results at CSr = 0.15. This further verifies the performance of our hardware system 
as well as the algorithm. 


Proposed 

SLOPE 


DAMP 


TVAL3 


GAP 


sHM 


CSr = 0.05 CSr = 0.1 CSr = 0.15 CSr = 0.2 CSr =0.25 CSr = 0.3 



Fig. 9: Real data: reconstruction results at different CSr with various algorithms. The image is 
of size 128 x 128. 


B. Compare with Other Algorithms 

Next, we consider the case with gray scale sensor and the image resolution of 128 x 128. To 
capture compressive measurements, we use a sensing matrix which is constructed from rows of a 
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Hadamard matrix of order N = 2 . Similar to Section |VI-A| each row of the Hadamard matrix 
is permuted according to a predetermined random permutation. The scene is composed of a 
photo printed on a paper and we capture the measurements of this photo. Example results using 
different numbers of measurement are shown in Figure [9} We compare the five algorithms used 
in the simulation. It can be seen that, similar to simulation results, SLOPE provides best results 
compared to other algorithms when CSr is small. Especially, at CSr = 0.05 and 0.1, SLOPE 
preserves many details of the face, for example, the left eye of “Lena”. DAMP introduces some 
“blob” noise because the BM3D denoising approach is used. Surprisingly, sHM now works better 
than TVAL3 and GAP. This is due to the following two reasons. Firstly, the tree structure in 
wavelet helps the reconstruction and secondly, the Bayesian framework developed in |5] is very 
robust to noise; it infers noise from the measurements. We further observed that the algorithm 
developed in [:5J is very helpful to remove spiky noise during reconstruction. 


CSr: 0.05 CSr: 0.1 CSr: 0.15 



CSr: 0.2 CSr: 0.25 CSr: 0.5 



Fig. 10: Real data: reconstruction results at different CSr with the proposed SLOPE algorithm. 
The image size is 217 x 302 x 3. 
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C. RGB Images 

Next we consider the RGB images captured by a tricolor sensor, with now a resolution of 
217 x 302 x 3. The sensing matrix is constructed from rows of a Hadamard matrix of order 
N = 2 16 and the first 65534 elements are used. The scene is the real scene of four books as 
shown in Figure [7] The reconstruction result is shown in Figure 10 with various compressive 
sensing ratio. 

Note that by using compressive sensing, we can save the sensors as well as the bandwidth. 
As stated before, we may progressively get better results by receiving more measurements. One 
application of compressive imaging is to get features in limited data by using a small bandwidth. 
From the results in Figure [9j we may identity high quality features from the reconstructed image 
at CSr around 0.1. If we want to get some details, for example, the book titles in Figure [lOj we 
may need CSr around 0.2. On the other hand, if we only need to identify that these are “books” 
in Figure [TOj CSr at 0.05 may be sufficient. 


VII. Conclusion 

An architecture for lensless compressive imaging is proposed. The architecture allows flexible 
implementations to build simple, reliable imaging devices with reduced size, cost and complexity. 
Furthermore, the images from the architecture do not suffer from such artifacts as blurring due 
to defocus of a lens. A prototype camera was built using low cost, commercially available 
components to demonstrate that the proposed architecture is indeed feasible and practical. A 
new compressive sensing reconstruction algorithm is proposed to achieve excellent results on 
both simulation and real data. The proposed algorithm enjoys the anytime property and thus 
provides better results as the computation increases. The algorithm is further compared with the 
JPEG compression and it demonstrates good results at high compression rates. Extensive results 
verified the performance of the algorithm as well as the imaging system. 


Appendix 

Proof of Theorem [I] For simplification, we consider the case without clustering patches, and 
it is readily to extend to the clustering patches case. We define the following operations for 
convenience: 
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Definition 1: 

x = H/3, 
ol = K~ x x, 


(49) 

(50) 


where TL 1 denotes the 2D or 3D transformation performed on the image patches including the 
pixel average matrix. 

We further define: 

Definition 2: 

@k = °-k + C fc) (51) 


since f3 k is obtained from the shrinkage of ot k . 

Since we assume ol* is sparse, we define the following set: 
Definition 3: 

1 + : Vi, that \a*\ > 0, 


where a* denotes the i-th entry of ol. 
We further define the sets: 
Definition 4: 


0 + • A7, that A fc 
2J— . Vi, that \ k A 


where a k ,i denotes the z-th entry of ot k . 


Proof: We start our derivation from (23) 


x k+l = x k + £A T (AA T ) 1 (y-Ax k ), 


(52) 


(53) 

(54) 


(55) 


and we first prove that, {||IIl}fc^=i I s a monotonically non-increasing sequence. 
Recall that y = Ax*, we have 

x k+l -x* = x k - x* + ^A T (AA T ) _1 (y - Ax k ) 


(56) 


It is equivalent to prove 
Following (|49l)-([50|). 


CKfc - OL 


* II 2 \oo 


\ oo 

J k=l 


is a monotonically non-increasing sequence. 


n-\x k+1 -x*) = 'H-\x k »x*)+i'R- 1 [A T {AA T )~ 1 A{x* ^x k )], (57) 
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which is equivalent to (please refer to (26)-(28)) 


cx-k+i — ct* = o k — a* + + £R 1 (RR t ) x R(a* — a k ) 


(58) 


Therefore, 

\\a k+1 - a *\\ 2 2 = \\o k -o*\\l + 2(o k -o*) T {C k + ZB T (im T r 1 R(o* ~(3 k )] 

+\\Ck + eR T (RR T )’ 1 R(«* - 0 k )\\l (59) 


What we want to prove is that 


|«fc+i 


o 


\oc k - o 


def n 

= r k < 0. 


(60) 


From (59), 


n = 2(a„ - a-) T [C„ + «R T (RR T )- 1 (a* - ft)] + ||C* + fR T (RR T )- 1 (a* - ft)||5 
= 2(ft -a'- C t ) T [C t + |R T (RR T )" 1 R(tt* - ft)]||C t + «R T (RR T )- 1 R(a’ - 0 k )\\l 
= 2(ft - a*) T [<* + 5R T (RR T )- 1 R(a* - ft)] - 2C T k [( t + «R T (RR T )-‘R(c>* - ft)] 

+ [C* + «R T (RR T )- 1 R(a* - ft)] T [C t + £R T (RR T )-'(“* - ft)] 

= 2(ft - a -) T C k + 2«ft - «-) T R T (RR T )- V* - ft) - IlCfclll + ||«R T (RR T )-'R(a* - 0 
= 2(ft - a-) T C t + l-HCJIl] + [« - l) 2 - l]||R T (RR T )- 1 R(a- - ft)||i. (61) 


There are three terms on the right-hand side of ( ]6T| ): 

• The second term [— HCfclll] is obviously non-positive. 

• The third term [(£ — l) 2 — 1] ||R t (RR t ) _1 R(q:* — /3 k )\\l is non-positive when £ 6 (0,2], 
given £ > 0. 

In the following, we only need to prove the first term is non-positive. 

2(/3 fc - at*) T C k = 2 © max jl - j~p° j _ © min |l, j~| |^) 

= 2 © max jl - |^©,oj^ (—o k © min jl, |~| — 2(ck*) t j — o k © min jl, j~| 

(62) 


where | • | is the absolute value performed on each element and 0 is the element-wise product. 


Therefore, for the values that are not in Z+, the second term of (62) will be zero. For the 


values in £71, the first term of (62) will be zero. 
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With the definitions of J+, and X + : 


2(/3 fc - ol*YC k = 2 Y^ k ~ |afc,i|)Afc + 2A fc Y a-sign (a kii ) + 2 Y a*a kji 

i£j+ i£j+ i£j- 


2Afc l (Afc - \a k ,i\) + Y <sign(a fe)i ) + Y 


<x- 


*ej+ i£j+ iej- k 

Given X k > 0, if we want to prove 2 (/3 k — a*) T £ k < 0, we need: 


- k,;D + Y1 «* si g n («fc,*) + Y1 - °’ 

i£j+ iej+ i£j- k 

which is equivalent to 

Y assign (a k ,i) + Y a *Y^ - Y (l a *>*l _ Afc )‘ 


i&J+ 


i&J- 


i&J+ 


(63) 


(64) 


(65) 


Lemma 1: ||a:*||i is an upper bound of Yhi^j+ a*sign(a fcji ) + Yhi^j_ a* 


Proof: 


Therefore 


Y a*sign(a M ) = Y «* si g n («fc,*) < Y 1“** 

i£j+ i£j+nl+ i&J+r\T + 


E 

i&J- 


* a k,i ^ \ | * 


A i 


i&J- 


| Oik,', 

A i, 


< 


E 

i&J-PCL + 


Y a i s ign(Q!fe,i) + Y 


* a k,i , || * || 

Oij < OL A. 


iej+ 


i£j- 


Ai 


( 66 ) 

(67) 

( 68 ) 


Along with (65), if we select \ k in each iteration such that ||/3 fc ||i > ||ok*||i, the first term 


of ( [67] ) will be non-positive. Along with the other two non-positive terms, we have proved that 
r k < 0. Till now, we have proved that {11 a k — a*\\ 1 )}k=\ ' s a rnonotonically non-increasing 


sequence if (34) is satisfied and £ 6 (0,2]. 

In the following, we prove that {||ck^ — ck*|||} converges to zero. We first prove that {|R«/, 


R«*|||} converges to zero, and then with the RIP (restricted isometry property) condition [231, 


[49], [501 on R, a. k converges to the true signal a*. 


From (61), when £ e (0, 2], 

r k = 2 (p k - a*) T C k + [-HCfella] + [(£ - l) 2 - 1]||R T (RR T )- 1 R(«* - (3 k ) f 2 < 0, (69) 
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if the constrain in (34) is satisfied. Now we tight the constrain of £ to £ € (0,2), and the third 
term of r k : 


[(£ _ l) 2 - 1]||R T (RR T ) _1 R(q:* - /3 fc ) ||1 < 0 (r fc <0) 
or ||R T (RR T )' 1 R(a* - (3 k )\\l = 0. 


(70) 

(71) 


Separately consider the following two cases: 

1) ||R T (RR T ) -1 R(a:*— {3 k )\\l = 0, implies R T (RR T ) _1 R(a*-/3 fc ) = 0 and left-multiplying 
by R shows 

m<x*-P k )\\ 2 2 = \\y-Rp k \\l = 0. (72) 


Since we impose that (3 k is sparse, (72) means that (3 k converges to a sparse solution of 
y = A 'Ha.* = Ax. 

Recall the RIP, 


(1 - <S a )MH < ||R s o:||| <(14- <$ s )||a|& (73) 

where S s e (0,1) is a constant and R s is a subset of R. (a k — a*) has at most (mt+K*) 


non-zero elements (recall the selection of in (39)) and we assume m* Xk + K* < N c , 
where K* is the number of non-zero elements in ol*. Define 


S = inf{<5 s ,Vs = 1,... ,m* x + K*}. 


(74) 


Imposing the RIP on (72), we have 

(l-5)||a*-j9j!<0. 


The RIP condition required for our proof is 0 < <5. 


m* +K* 


(75) 


< 1. If we select m* Xk = K 


(which is a lower bound of rn*\ k ), we have 0 < 8 2 k* < 1, which is the same as the condition 


derived in [51J. Under this condition, ( f75| ) means 

ll a * — Pk\\\ ~ 0- 


(76) 


This means (3 k converges to ct* and in this case (3 k = ct k . This further means x k converges 
to x* and x k = x k . 
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2) ||R t (RR t ) 1 R(ck* — (3 k ) ||| > 0 and therefore r k < 0 strictly. In this case, 

||oc fc+ i — a£"‘||i — llatfc — Q£*||i = r k < 0 
||a fc+ i ™ = ||a fe - a*||i + r k < \\a k -a.*\\l 

Since ||afc — a *||| > 0, and it is now a decreasing sequence, 

lim ||CKfc — ct* || 2 = Const (> 0), 

k —^oo 

and thus 


(77) 


(78) 


lim r k = 0. (79) 

k —^oo 

Since all the three terms of r k are non-positive, r k —> 0 means all the three terms approach 
zero, specifically 


lim Cl- = 0 and lim ||R T (RR 1 

k —^oo k—> oo 


\-l 


R(<*‘ - At)ll! = o. 


(80) 


lini/^oc Ck — 0 can be obtained via lim fc -s-oo \ k = 0 and in this case lim^oo j3 k = ct k . 


lim^oo ||R t (RR t ) 1 R(q:* — j3 k )\\\ = 0 means lim / .^ oc ||R(o:* — /3 fc )||| = 0, and similar 


to (75), when the RIP on R, 


kJ 112 

is satisfied, 


lim ||a* - P k \\\ = 0. (81) 

k —^oo 

This means j3 k converges to a*, which is equivalent to x k converges to x*. 

Therefore, we have proved in both cases x k (ct k ) monotonically converges to x* (a*). Till now, 
Theorem [I] has been proved. ■ 
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